Joint Syntacto-Discourse Parsing and the Syntacto-Discourse Treebank
نویسندگان
چکیده
Discourse parsing has long been treated as a stand-alone problem independent from constituency or dependency parsing. Most attempts at this problem are pipelined rather than end-to-end, sophisticated, and not self-contained: they assume goldstandard text segmentations (Elementary Discourse Units), and use external parsers for syntactic features. In this paper we propose the first end-to-end discourse parser that jointly parses in both syntax and discourse levels, as well as the first syntacto-discourse treebank by integrating the Penn Treebank with the RST Treebank. Built upon our recent span-based constituency parser, this joint syntactodiscourse parser requires no preprocessing whatsoever (such as segmentation or feature extraction), achieves the state-of-theart end-to-end discourse parsing accuracy.
منابع مشابه
The WeSearch Corpus, Treebank, and Treecache - A Comprehensive Sample of User-Generated Content
We present the WeSearch Data Collection (WDC)—a freely redistributable, partly annotated, comprehensive sample of User-Generated Content. The WDC contains data extracted from a range of genres of varying formality (user forums, product review sites, blogs and Wikipedia) and covers two different domains (NLP and Linux). In this article, we describe the data selection and extraction process, with...
متن کاملWho Did What to Whom? A Contrastive Study of Syntacto-Semantic Dependencies
We investigate aspects of interoperability between a broad range of common annotation schemes for syntacto-semantic dependencies. With the practical goal of making the LinGO Redwoods Treebank accessible to broader usage, we contrast seven distinct annotation schemes of functor–argument structure, both in terms of syntactic and semantic relations. Drawing examples from a multi-annotated gold sta...
متن کاملA Constituent-Based Approach to Argument Labeling with Joint Inference in Discourse Parsing
Discourse parsing is a challenging task and plays a critical role in discourse analysis. In this paper, we focus on labeling full argument spans of discourse connectives in the Penn Discourse Treebank (PDTB). Previous studies cast this task as a linear tagging or subtree extraction problem. In this paper, we propose a novel constituent-based approach to argument labeling, which integrates the a...
متن کاملUbertagging: Joint Segmentation and Supertagging for English
A precise syntacto-semantic analysis of English requires a large detailed lexicon with the possibility of treating multiple tokens as a single meaning-bearing unit, a word-with-spaces. However parsing with such a lexicon, as included in the English Resource Grammar, can be very slow. We show that we can apply supertagging techniques over an ambiguous token lattice without resorting to previousl...
متن کاملIdentifying Pathological Findings in German Radiology Reports Using a Syntacto-semantic Parsing Approach
In order to integrate heterogeneous clinical information sources, semantically correlating information entities have to be linked. Our discussions with radiologists revealed that anatomical entities with pathological findings are of particular interest when linking radiology text and images. Previous research to identify pathological findings focused on simplistic approaches that recognize dise...
متن کامل